Python: information-flow control prompt injection defense by eavanvalkenburg · Pull Request #5331 · microsoft/agent-framework

eavanvalkenburg · 2026-04-17T11:39:04Z

Motivation and Context

This draft PR brings the Python information-flow control prompt injection defense work from feature/python-fides toward main for review and integration.

Description

adds the FIDES information-flow control security model for tracking integrity and confidentiality through agent execution
introduces secure-agent middleware, secure tools, and context-provider-based configuration for policy enforcement and hidden-content handling
includes supporting samples, docs, and follow-up fixes needed to keep the Python workspace validation suite green

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

moonbox3 · 2026-04-17T11:41:25Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/core/agent_framework
_feature_stage.py	117	8	93%	99, 156, 167, 174, 181, 188, 207, 235
_tools.py	1014	81	92%	219–220, 395, 397, 410, 435–437, 445, 463, 477, 484, 491, 514, 516, 523, 531, 650, 690–692, 694, 700, 752–754, 779, 805, 809, 847–849, 853, 875, 1017–1023, 1059, 1071, 1073, 1075, 1078–1081, 1102, 1106, 1110, 1124–1126, 1473, 1559, 1587, 1609, 1613, 1743, 1747, 1793, 1854–1855, 1958, 2011, 2031, 2033, 2089, 2152, 2324–2325, 2345, 2401–2402, 2540–2541, 2608, 2613, 2620
observability.py	769	80	89%	378, 380–381, 384, 387, 390–391, 396–397, 403–404, 410–411, 418, 420–422, 425–427, 432–433, 439–440, 446–447, 454, 611–612, 740, 744–746, 748, 752–753, 757, 795, 797, 808–810, 812–814, 818, 826, 950–951, 1113, 1355–1356, 1461–1466, 1473–1476, 1480–1488, 1495, 1583, 1623–1624, 1767, 1903, 2100, 2318, 2320
security.py	705	173	75%	437–438, 456, 459–460, 538–539, 541, 589, 601, 617, 634, 701–702, 705, 709, 716–717, 719, 726–731, 733, 735–742, 744, 747–751, 753–755, 757–758, 761–762, 764–765, 768–770, 772–776, 779–783, 785, 966, 972–973, 978, 980–981, 1011–1012, 1040–1048, 1069, 1172–1173, 1204–1205, 1230–1231, 1235–1236, 1270–1273, 1355–1356, 1361–1366, 1371–1373, 1704, 1709–1710, 1714, 1718–1720, 1769, 1781, 1783, 1800, 1821, 1832, 1858, 1899–1900, 1925, 2022, 2050–2052, 2087–2089, 2097, 2105, 2352–2353, 2355, 2357–2359, 2362, 2374, 2380–2381, 2383, 2414, 2418–2421, 2423, 2537, 2540–2543, 2546–2547, 2549, 2551, 2553, 2556–2560, 2567, 2571, 2582–2583, 2585, 2587–2589, 2650, 2660–2661
TOTAL	31789	3762	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
6350	30 💤	0 ❌	0 🔥	1m 47s ⏱️

Copilot

Pull request overview

This PR brings the Python FIDES information-flow control (integrity + confidentiality) prompt-injection/data-exfiltration defense closer to main, adding core security primitives/middleware, DevUI support for policy-violation approvals, and end-to-end samples/docs/tests.

Changes:

Adds FIDES security samples and documentation (quick start + developer guide + implementation summary + ADR).
Extends DevUI mapping/execution to round-trip policy-violation metadata through approval events.
Adjusts core function-invocation plumbing and test suites to support the new approval/policy flow and keep validation green.

Reviewed changes

Copilot reviewed 14 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
python/samples/02-agents/security/repo_confidentiality_example.py	New sample demonstrating confidentiality labels + exfiltration prevention.
python/samples/02-agents/security/email_security_example.py	New sample demonstrating integrity taint handling + quarantined processing.
python/samples/02-agents/security/README.md	New quick-start documentation for SecureAgentConfig/FIDES patterns.
python/samples/02-agents/security/FIDES_DEVELOPER_GUIDE.md	New deep-dive developer guide for the FIDES model and APIs.
python/packages/core/tests/test_security.py	New comprehensive unit tests for labeling/middleware/policy enforcement.
python/packages/core/agent_framework/_tools.py	Updates tool invocation + approval replacement behavior for policy approvals.
python/packages/core/agent_framework/init.py	Exports new FIDES/security APIs and `ai_function` alias.
python/packages/core/agent_framework/_feature_stage.py	Adds `ExperimentalFeature.FIDES`.
python/packages/core/agent_framework/observability.py	Hardens finish_reason attribute handling.
python/packages/devui/agent_framework_devui/_mapper.py	Emits policy violation details in approval request events when present.
python/packages/devui/agent_framework_devui/_executor.py	Preserves policy-violation metadata on approval responses.
python/packages/devui/tests/devui/test_ui_memory_regression.py	Skips the test when CDP websocket URL cannot be obtained.
python/packages/foundry/tests/foundry/test_foundry_embedding_client.py	Makes env patching deterministic via `clear=True`.
docs/features/FIDES_IMPLEMENTATION_SUMMARY.md	Adds an implementation summary for FIDES components/deliverables.
docs/decisions/0024-prompt-injection-defense.md	New ADR documenting the design rationale and tradeoffs.

github-actions

Automated Code Review

Reviewers: 1 | Confidence: 88%

✗ Design Approach

The approval-flow work mostly moves in the right direction—matching approved results by call_id is a solid improvement—but one new piece of the design is too brittle: _replace_approval_contents_with_results now detects “placeholder” tool results by searching for the magic substring [APPROVAL_PENDING] inside function_result.result. In this framework, function_result.result is just arbitrary tool output (python/packages/core/agent_framework/_types.py:809-865), so using it as control-flow state leaks middleware implementation details into the generic content model and can misfire on legitimate tool output. That should be modeled as structured state instead of string matching. I found two design-level problems in the new security layer. The feature’s isolation model depends on module-global mutable state for per-invocation context, so it will behave correctly only as long as one tool invocation runs at a time in a process. In the existing async tool pipeline, that assumption does not hold: overlapping runs can clober each other’s security context, and separately configured agents can silently overwrite each other’s quarantine client. The main design problem is in the new DevUI policy-approval path: the mapper now emits policy_violation metadata, but the existing pending-approval state and submit flow still only preserve request_id and function_call, so legitimate UI approvals will drop the metadata while _executor now tries to rebuild security-sensitive state from client input. That is both incomplete and a step away from the existing server-validated approval model. I also saw one test change that weakens coverage by turning DevTools startup failures into skips rather than failures. That means the example does not actually exercise the cross-turn security behavior it claims to demonstrate, and it teaches readers the wrong integration pattern for any policy that depends on accumulated conversation state.

Flagged Issues

python/samples/02-agents/security/email_security_example.py expects security context to carry across two independent agent.run() calls, but core agent code creates a new session when session= is omitted (_agents.py:1185-187, 1406-1408). Scenario 2 will not run in the tainted context from scenario 1. Reuse a single AgentSession or collapse into one run.

Suggestions

Represent pending-approval tool results with explicit metadata or a dedicated content shape, then replace them by call_id using that structured marker instead of inspecting function_result.result text.
Keep the quarantine chat client on SecureAgentConfig/middleware state and have quarantined_llm() obtain it from the current invocation context, preserving per-agent isolation.
Reuse a single AgentSession in the email sample's CLI scenario so the second step actually demonstrates policy enforcement after untrusted content has entered the conversation.

Automated review by eavanvalkenburg's agents

github-actions

Automated Code Review

Reviewers: 4 | Confidence: 90%

✗ Correctness

The main code changes in _tools.py improve approval-flow robustness (null-checks, call_id-based matching instead of positional indexing, empty message cleanup). The observability.py fix is correct and necessary. However, there is one concrete regression: the normal success path of _auto_invoke_function silently drops additional_properties propagation from the function call to the function result, while every other code path in the same function (no-middleware direct execution, MiddlewareTermination, Exception, and error paths) continues to propagate them. This inconsistency means the same function behaves differently depending on whether middleware is present and whether execution succeds or fails. The _parse_github_mcp_labels function in security.py contains a correctness bug: it uses lexicographic string comparison (field_conf.value > most_restrictive_confidentiality.value) to determine the most restrictive confidentiality label. Because "private" < "public" in Python string ordering (since 'r' < 'u'), PRIVATE confidentiality from GitHub MCP labels is incorrectly treated as less restrictive than PUBLIC, causing PRIVATE data to remain labeled PUBLIC. This occurs at two locations (lines 764 and 782). The rest of the codebase (e.g., combine_labels, check_confidentiality_allowed) correctly uses a priority dict for this comparison.

✗ Security Reliability

The _tools.py changes improve approval-flow robustness (call_id-based matching instead of positional, null-safety on approved_function_call) and add security middleware integration points. Two reliability concerns: (1) the normal middleware success path silently drops additional_properties that error/termination paths still propagate, creating inconsistent metadata on function results; (2) placeholder replacement uses a magic string sentinel '[APPROVAL_PENDING]' in tool-result text, which is fragile—a structured Content attribute would be safer. The observability fix for non-string finish_reason is correct and needed. Two security-relevant bugs found in the new security.py module: (1) The GitHub MCP label parser uses lexicographic string comparison for confidentiality levels, causing PRIVATE content to be misclassified as PUBLIC because "private" < "public" lexicographically. This undermines confidentiality protections for GitHub MCP-sourced data. (2) Thread-local storage (threading.local()) is used to pass the middleware instance to async security tools, but threading.local() does not isolate concurrent asyncio coroutines on the same thread, risking cross-coroutine middleware leakage. The production code changes add policy-violation metadata plumbing through the approval flow. The _executor.py change spreads untrusted client-supplied dict data (**policy_violation_data) into additional_properties, which allows a malicious client to inject arbitrary keys or override the policy_violation: True flag. The _mapper.py changes safely extract specific keys. Test changes and foundry test fix are fine.

✗ Test Coverage

The PR introduces significant new logic in _tools.py — placeholder [APPROVAL_PENDING] replacement, empty-message cleanup, function_approval_request passthrough from middleware, and call_id null validation — but none of these new code paths have corresponding unit tests in test_function_invocation_logic.py. The observability isinstance(finish_reason, str) guard also lacks a test for non-string finish_reason values. While test_security.py is listed as a new file, it does not exist on disk and presumably appears in a later chunk; the gaps noted here are in the core _tools.py approval flow, which should be tested independently of the security module. The security module contains a correctness bug in _parse_github_mcp_labels where confidentiality levels are compared via string ordering ("private" < "public" alphabetically) instead of the intended hierarchy (PUBLIC < PRIVATE < USER_IDENTITY), causing PRIVATE data to not be recognized as more restrictive than PUBLIC. No test file for the security module exists on disk — test_security.py is listed as a changed file but appears in another chunk, so its coverage cannot be verified here. A test specifically covering _parse_github_mcp_labels with PRIVATE fields would catch this bug. The new test_security.py file is extensive (~2578 lines) and covers the major security subsystems well: label creation/serialization, combine_labels, ContentVariableStore, middleware label tracking, policy enforcement, auto-hiding, tiered label propagation, confidentiality checks, and quarantined LM. Three test coverage gaps stand out: (1) the devui _executor.py and _mapper.py changes add policy_violation extraction and mapping logic with zero corresponding tests, (2) test_audit_log_recording only asserts the initial empty state and never triggers a violation to verify recording, and (3) the new _replace_approval_contents_with_results tests in test_function_invocation_logic.py are well-structured and cover the key call-id-matching fix thoroughly.

✗ Design Approach

That is a contract regression rather than a local bug: the framework already treats additional_properties as the extensibility channel on Content, and the rest of _auto_invoke_function still preserves it. With middleware enabled, result metadata now disappears only on the success path, which is especially risky for any higher-level features that depend on result labels or other propagated metadata. First, it stores per-invocation middleware state in a thread-local even though the core tool loop executes function calls concurrently with asyncio.gather, so concurrent tool calls can read or clear each other’s security context. Second, the GitHub MCP label parser derives confidentiality ordering from enum string values instead of the explicit confidentiality lattice used elsewhere, which can silently downgrade private content during label propagation. I also found the new memory-test skip broad enough to hide real browser/DevTools startup regressions instead of isolating a known unsupported environment. I found two design-level problems in the new security samples. First, the email demo assumes the security context stays tainted across two separate agent.run(...) calls, but the core agent only preserves provider/history state when you reuse an explicit session, so the second step does not reliably demonstrate the policy boundary it claims to show. Second, both samples hardcode gpt-4o-mini as the quarantine Foundry model even though FoundryChatClient treats model as a deployment name, which makes the samples depend on a very specific project setup instead of a configurable quarantine deployment.

Flagged Issues

The [APPROVAL_PENDING] placeholder replacement logic in _replace_approval_contents_with_results (lines 1872–1882 of _tools.py) is entirely new behavior that modifies message contents in-place, but has zero test coverage across all test files.

Suggestions

Narrow the memory regression test skip — converting the generic DevTools timeout into pytest.skip() masks browser-launch and remote-debugging regressions this test is supposed to catch. Skip only for a narrowly identified unsupported-environment condition.
Reuse an explicit session in the email sample (e.g., session = agent.create_session() passed to both agent.run() calls) so the second step genuinely demonstrates cross-turn policy enforcement.
Make the quarantine deployment configurable (e.g., FOUNDRY_QUARANTINE_MODEL env var with fallback to FOUNDRY_MODEL) and update documentation to reference deployment names rather than a literal gpt-4o-mini string.
Add unit tests for quarantined_llm both with and without a quarantine client configured, verifying the response is labeled UNTRUSTED and tool_choice='none' is passed.

Automated review by moonbox3's agents

* fides integration * documentation * documentation * documentation * human-approval on policy violation * numenous hyena 'works' * IFC based implementation * minor edits in documentation * rebasing the branch and running the email example * Add security tests for IFC middleware * Fix Role.TOOL NameError in approval handling * tiered labelling scheme * 3 tier labelling scheme in middleware * Adapt security middleware to list[Content] tool results * Refactor SecureAgentConfig as context provider and address Copilot review comments * Update FIDES docs to reflect context provider pattern and update code for ContextProvider rename * Fix security examples: use OpenAIChatClient instead of non-existent AzureOpenAIChatClient * Address PR review: consolidate security modules, remove ContentLineage, update docs * remove unrelated files * remove comment from _tools.py and rename decision file * Fix CI failures: Bandit B110, broken md links, hosted approval passthrough * apply template to decision doc 0024 * minor fixes to decision doc 0024 --------- Co-authored-by: Aashish <t-akolluri@microsoft.com>

* Python: follow up FIDES security flow Refine the secure approval path, mark the security classes with the FIDES experimental feature label, and clean up the related docs/tests. Also fix workspace-level validation regressions uncovered while running the full Python check suite. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: remove FIDES GitHub MCP sample Drop the GitHub MCP security sample from the FIDES follow-up branch while keeping the remaining security docs and samples intact. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* updated import naming and comment from review * Add approval replay None call-id test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…nt in email_security_example (#5446) * Address PR review: fix paths and update FIDES implementation * Address PR comments and add session tracking in email example in samples * Fix session creation and resolve merge conflict in docstring example * Resolve merge conflict in docstring example

…ment (#5617) Adds test coverage for the second-pass logic in `_replace_approval_contents_with_results` that removes messages whose `contents` list becomes empty after first-pass content removal. Addresses review comment on PR #5331: #5331 (comment) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3 added documentation Improvements or additions to documentation python labels Apr 17, 2026

eavanvalkenburg requested a review from Copilot April 18, 2026 07:32

Copilot started reviewing on behalf of eavanvalkenburg April 18, 2026 07:32 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

Comment thread docs/features/FIDES_IMPLEMENTATION_SUMMARY.md Outdated

Comment thread python/samples/02-agents/security/README.md Outdated

Comment thread python/samples/02-agents/security/README.md Outdated

Comment thread python/packages/core/agent_framework/_tools.py Outdated

eavanvalkenburg marked this pull request as ready for review April 22, 2026 10:44

github-actions Bot reviewed Apr 22, 2026

View reviewed changes

TaoChenOSU reviewed Apr 22, 2026

View reviewed changes

github-actions Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread python/packages/core/agent_framework/_tools.py

eavanvalkenburg self-assigned this Apr 28, 2026

eavanvalkenburg added this to Agent Framework Apr 28, 2026

eavanvalkenburg moved this to In Progress in Agent Framework Apr 28, 2026

eavanvalkenburg moved this from In Progress to Community PR in Agent Framework Apr 28, 2026

shrutitople and others added 5 commits May 4, 2026 09:57

Address PR review: fix paths and update FIDES implementation (#5352)

2607ba1

eavanvalkenburg force-pushed the feature/python-fides branch from 72adca2 to 9711562 Compare May 4, 2026 08:09

eavanvalkenburg mentioned this pull request May 4, 2026

Python: add test for empty-message pruning in approval result replacement #5617

Merged

2 tasks

eavanvalkenburg enabled auto-merge May 4, 2026 08:40

TaoChenOSU approved these changes May 4, 2026

View reviewed changes

giles17 approved these changes May 5, 2026

View reviewed changes

eavanvalkenburg added this pull request to the merge queue May 5, 2026

Merged via the queue into main with commit ddfbdf5 May 5, 2026
34 checks passed

github-project-automation Bot moved this from Community PR to Done in Agent Framework May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: information-flow control prompt injection defense#5331

Python: information-flow control prompt injection defense#5331
eavanvalkenburg merged 6 commits into
mainfrom
feature/python-fides

eavanvalkenburg commented Apr 17, 2026

Uh oh!

moonbox3 commented Apr 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

eavanvalkenburg commented Apr 17, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

moonbox3 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

✗ Design Approach

Flagged Issues

Suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

✗ Correctness

✗ Security Reliability

✗ Test Coverage

✗ Design Approach

Flagged Issues

Suggestions

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

moonbox3 commented Apr 17, 2026 •

edited

Loading